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Contemporary paternal genetic landscape of Polish 
and German populations: from early medieval Slavic 
expansion to post-World War II resettlements 


Krzysztof Rebata!, Begona Martinez-Cruz!, Anke Ténjes**, Peter Kovacs*, Michael Stumvoll**, Iris Lindner?, 
Andreas Biittner®, H-Erich Wichmann®, Daniela Sivakova’, Miroslav Sotak®, Lluis Quintana-Murci’, 
Zofia Szczerkowska”, David Comas*! and the Genographic Consortium!° 


Homogeneous Proto-Slavic genetic substrate and/or extensive mixing after World War II were suggested to explain homogeneity 
of contemporary Polish paternal lineages. Alternatively, Polish local populations might have displayed pre-war genetic 
heterogeneity owing to genetic drift and/or gene flow with neighbouring populations. Although sharp genetic discontinuity along 
the political border between Poland and Germany indisputably results from war-mediated resettlements and homogenisation, it 
remained unknown whether Y-chromosomal diversity in ethnically/linguistically defined populations was clinal or discontinuous 
before the war. In order to answer these questions and elucidate early Slavic migrations, 1156 individuals from several Slavic 
and German populations were analysed, including Polish pre-war regional populations and an autochthonous Slavic population 
from Germany. Y chromosomes were assigned to 39 haplogroups and genotyped for 19 STRs. Genetic distances revealed similar 
degree of differentiation of Slavic-speaking pre-war populations from German populations irrespective of duration and intensity 
of contacts with German speakers. Admixture estimates showed minor Slavic paternal ancestry (~ 20%) in modern eastern 
Germans and hardly detectable German paternal ancestry in Slavs neighbouring German populations for centuries. BATWING 
analysis of isolated Slavic populations revealed that their divergence was preceded by rapid demographic growth, undermining 
theory that Slavic expansion was primarily linguistic rather than population spread. Polish pre-war regional populations showed 
within-group heterogeneity and lower STR variation within R-M17 subclades compared with modern populations, which might 
have been homogenised by war resettlements. Our results suggest that genetic studies on early human history in the Vistula and 
Oder basins should rely on reconstructed pre-war rather than modern populations. 
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The male genetic landscape of the European continent has been 
shown to be clinal and influenced primarily by geography rather than 
by language.! One of the most outstanding phenomena in the 
Y-chromosomal diversity in Europe concerns the population of 
Poland, which reveals geographic homogeneity of Y-chromosomal 
lineages in spite of a relatively large geographic area seized by the 
Polish state.? Moreover, a sharp genetic border has been identified 
between paternal lineages of neighbouring Poland and Germany, 
which strictly follows a political border between the two countries.? 
Massive human resettlements during and shortly after the World War 
II (WWII), involving millions of Poles and Germans, have been 
proposed as an explanation for the observed phenomena.” Thus, it 
was possible that the local Polish populations formed after the early 
Slavic migrations displayed genetic heterogeneity before the war 
owing to genetic drift and/or gene flow with neighbouring 


homogeneity of Polish paternal lineages existed already before the 
war owing to a common genetic substrate inherited from the ancestral 
Slavic population after the Slavs’ early medieval expansion in Europe.” 

From the linguistic point of view, western Slavic dialects are 
classified as Czech/Slovak, Lusatian and Lekhitic; the Lekhitic branch 
is further divided into Polish, Pomeranian and Polabian.* Nowadays, 
among the western Slavs, only Polish and Czech/Slovak dialects have 
evolved into fully viable languages with millions of speakers. Lusatian 
is spoken by 66000 Sorbs inhabiting southeastern Germany, down 
from 166000 speakers in the late 19th century. Present-day 
Pomeranian comprises 53000 speakers of Kashubian in northern 
Poland,® although roughly half a million people in Poland claim 
Kashubian and half Kashubian ancestry.” While Slavists classify 
Kashubian as a separate Slavic language,’ the vast majority of 
Kashubes declare Polish ethnicity.° Polabian was spoken until the 
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18th century in what is now northeastern Germany.® The Polish 
linguistic area is further subdivided into four dialectal groups, roughly 
corresponding to early Slavic tribal division: Greater Polish, Lesser 
Polish, Silesian and the most linguistically divergent Masovian. 

There exists an opinion among academics that ‘the Slavic ethno- 
genesis remains a major, if not the most important, topic in the 
historiography of Eastern Europe.!® Most of the current knowledge 
on this subject results from indirect evidence based on linguistics, 
archaeology and anthropology, including, since recently, molecular 
genetics.!! The changes seen in the 5th—6th centuries in eastern 
Europe are explained either in terms of a demographic expansion of 
the Slavic people, carrying with them their genes, customs and 
language, or as a primarily linguistic spread with only minor 
contribution of migration.!* 

We used high-resolution typing of Y-chromosomal binary and 
microsatellite markers first to test for male genetic structure in the 
Polish population before massive human resettlements in the 
mid-20th century, and second to verify if the observed present-day 
genetic differentiation between the Polish and German paternal 
lineages is a direct consequence of the WWII or it has rather resulted 
from a genetic barrier between peoples with distinct linguistic 
backgrounds. The study further focuses on providing an answer to 
the origin of the expansion of the Slavic language in early medieval 
Europe. For the purpose of our investigation, we have sampled three 
pre-WWII Polish regional populations, three modern German 
populations (including the Slavic-speaking Sorbs) and a modern 
population of Slovakia. 


MATERIALS AND METHODS 

A total of 1156 individuals were analysed in the present study, including 520 
unrelated males descending directly from pre-WWII native inhabitants of three 
distinct ethnolinguistic regions of Poland: Kaszuby (Kashubian-speaking 
region, n= 204), Kociewie (Greater Polish-speaking region, n=158) and 
Kurpie (Masovian-speaking region, n= 158). Inhabitants of the Kurpie region 
trace their origin to Masovian peasants who since the 16th century colonised 
forests between Masovia and Prussia, and were subjected to some degree of 
geographic and cultural isolation.? The Kashubian samples were additionally 
assigned to three different dialects:? northern (n=70), central (n=93) and 
southern (n=41). As genetic distances revealed the three Kashubian 
subpopulations to be genetically undistinguishable (data not shown), they 
were treated in many subsequent analyses as one population. Only individuals 
whose ancestors were born in villages and inhabiting the studied areas for at 
least three generations in paternal lineages were selected for the study. In 
addition, a sample set from Germany comprised Sorbs from Lusatia (Upper 
Sorbian speakers, n= 123) and Germans from Mecklenburg (northeastern 
Germany, n= 131) and western Bavaria (southwestern Germany, n= 218). 
Finally, DNA samples from western Slovakia (n= 164), used previously in a 
comprehensive analysis of Y-STR variation in the Slavic populations,!! were 
also included in the study. The studied populations and their linguistic 
background are summarised in Table 1, while their geographic locations on an 


ethnolinguistic map of central Europe in the early 20th century are shown in 
Supplementary Figure S1. 

Two multiplex PCRs were utilised to genotype a total of 19 Y-STRs, 
including 17 STRs present in the commercially available AmpFISTR Yfiler PCR 
Amplification Kit (Applied Biosystems, Foster City, CA, USA). The second 
multiplex comprised two additional Y-STRs: DYS388 and DYS426, as well as 
six biallelic markers, displaying amplified fragment length polymorphism: 
A-M91, BT-M139, B-M60, M-M186, O-M175 and R-M17. As the Yfiler kit 
amplifies two DYS385 loci simultaneously avoiding their discrimination, 
DYS385 was excluded from all the analyses performed, providing a total of 
17 Y-STRs (including DYS388 and DYS426) for inferences. Other Y-SNPs were 
genotyped individually with the use of pre-designed TaqMan assays with 
previously published primer sequences.!4 Their phylogenetic relationship is 
shown in Figure 1. 

Observed haplogroup frequencies were employed to calculate a matrix of 
pairwise Fr values. Y-STR haplotypes were used to obtain ®gr and Rey 
molecular distances. Calculations of genetic distances, estimations of corre- 
sponding P-values based on 10000 permutations and analysis of molecular 
variance (AMOVA) were performed with the use of Arlequin 3.1 software.!° 
In order to thoroughly explore the Y chromosome distribution in the Polish 
population before and after the WWII, our data were compared with 7-STR 
haplotypes published for a pre-WWII southern Polish population from the 
Lesser Polish-speaking regions of Podhale and Sadecczyzna (n= 140)!® and for a 
number of modern Polish populations,!©!® including Kaszuby (n= 142) and 
Podhale and Sadecczyzna (n = 226). Multidimensional scaling (MDS) based on 
linearised distances!” was carried out with the use of STATISTICA 9.1 software 
(StatSoft, Tulsa, OK, USA). Network 4.6 software (Fluxus Technology, Clare, 
UK) was applied to build a median-joining network”? of Y-STR haplotypes with 
a maximum parsimony option.?! Mean pairwise differences (MPDs) within 
populations based on the 17-STR haplotypes and the weighted mean 
intralineage MPDs (WIMPs) were calculated as previously described.” STR 
variation within chosen haplogroups was assessed by genetic variance (Vp)?> and 
by average squared difference in the number of repeats between all chromosomes 
and a median haplotype, averaged over microsatellite loci (ASDg).”4 

The pre-WWII Polish samples were additionally divided into three 
subgroups, depending on surnames of the tested individuals. The first group 
comprised individuals carrying surnames with roots revealing Slavic/eastern 
European etymology or origin. Accordingly, males with surname roots 
indicating German/western European etymology or origin were included in 
the second group. The third group contained surnames with unclear or hybrid 
etymology. For each surname, the assignment was based on linguistic analysis 
provided in etymological dictionaries.2>*7 

BATWING?® was used to assess time of demographic expansion and split of 
the populations of Kaszuby and Lusatia. Time of start of demographic 
expansion, growth rate and time of population split were estimated using a 
model of exponential growth from a constant-size ancestral population. 
Observed mutation rates for each marker were used in the analysis.7? Y-STR 
mutation data published in the Y Chromosome Haplotype Reference 
Database*? and in the literature?*3! were used to set mutation rate priors as 
provided in Supplementary Table S1. An initial effective population size and 
growth rate were given priors of gamma(1.1,0.0001) and gamma(1.01,1), 
respectively, in order to cover very wide ranges of possible values.7? Maximally 
uninformative uniform priors were set for dates of the expansion start and 


Table 1 Linguistic affiliations, Y-STR MPD and WIMP values (+ SD), and surname distributions for the analysed populations 


Population Linguistic affiliation MPD WIMP. Slavic vs German surnames 
Kaszuby (n= 204) W Slavic, Pomeranian, Kashubian 9.26+4.27 5.07+1.29 0.681: 0.250 
Kociewie (n= 158) W Slavic, Polish, Greater Polish 9.30+4.30 5.23+1.15 0.791: 0.177 
Kurpie (n= 158) W Slavic, Polish, Masovian 9.32+4.30 4.70+1.15 0.873: 0.089 
Lusatia (n= 123) W Slavic, Lusatian, Upper Sorbian 8.244+3.85 Ai 2341.31. — 

Slovakia (n= 164) W Slavic, Czech/Slovak, W Slovak 9.83+4.52 4.92+1.02 —_ 
Mecklenburg (n= 131) German 10.044 4.62 5.19+0.82 _— 

Bavaria (n= 218) German 10.43+4.77 5.50+0.75 _— 


Abbreviations: MPD, mean pairwise difference; WIMP, weighted mean intralineage mean pairwise difference. 
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population split. SNP information was integrated for the phylogenetic 
reconstruction, but it was not considered for posterior estimates. A total of 10 
million Markov chain Monte Carlo (MCMC) samples were collected: the first 5 
million were rejected as burn-in and the remaining 5 million were used for 
inference. BATWING convergence was assessed from two independent runs with 
different seeds with the use of Gelman and Rubin’s convergence diagnostic 
available in the CODA package for R.3*4 In order to put the BATWING results 
in a historical time scale, a male generation interval of 31 years*° was used. 

Populations speaking Sorbian and Kashubian, linguistically the most closely 
related to extinct Slavic dialects spoken in the past in present-day eastern 
Germany, were used to assess Slavic ancestry in the eastern German 
Y-chromosomal pool. In addition, German admixture was assessed in genetic 
outliers detected in the MDS analysis, that is, the Sorbs and Kashubes, with 
the Greater Polish-speaking population of Kociewie as the parental population 
(the Greater Polish dialects directly neighbour the Kaszuby region and 
share linguistic similarities with the Lusatian dialects’). For haplogroup data, 
genetic admixture estimators based on allele frequencies were assessed. An mp 
estimator comparing directly haplogroup frequencies was computed with the 
use of Admix 2.0.4° A maximum likelihood approach-based my estimator 
considering an effect of genetic drift in admixed and parental populations was 
obtained with the aid of Leadmix software.*” As the overwhelming majority 
of Y-STR haplotypes were singletons specific to only one population, in case of 
STR data, an my estimator taking into account molecular distances between 
haplotypes rather than haplotype frequencies was computed with the use of 
Admix 2.0. In order to eliminate likely haplotype homoplasy, SNP phylogeny 
was integrated into STR information, weighting biallelic mutations 1000-fold 
higher than STR mutations.*® The molecular relationship between haplotypes 
was defined as the sum of squared differences in allele sizes.** 


RESULTS 

A total of 39 different haplogroups have been detected in the studied 
sample set (Figure 1), including an insertion polymorphism at M91 
(M9 linsT with a stretch of 10 thymidines) previously observed in two 
individuals from a large worldwide sample set.*? No derived alleles at 
R-M153 (a subclade of R-P312) and R-M222 (a subclade of R-L21) 
have been detected. Genotyping results for all 1156 individuals are 
provided in Supplementary Table S2. 

AMOVA in the studied populations revealed statistically significant 
support for two linguistically defined groups of populations in both 
haplogroup and haplotype distributions (Table 2). It also detected 
statistically significant genetic differentiation for both haplogroups 
and haplotypes in three Polish pre-WWII regional populations 
(Table 2). The AMOVA revealed small but statistically significant 
genetic differentiation between the Polish pre-war and modern 
populations (Table 2). When both groups of populations were tested 
for genetic structure separately, only the modern Polish regional 
samples showed genetic homogeneity (Table 2). Regional differentia- 
tion of 10-STR haplotypes in the pre- WWII populations was retained 
even if the most linguistically distinct Kashubian speakers were 
excluded from the analysis (Rgp= 0.00899, P=0.01505; data not 
shown). Comparison of Y chromosomes associated with etymologi- 
cally Slavic and German surnames (with frequencies provided in 
Table 1) did not reveal genetic differentiation within any of the three 
Polish regional populations for all three (Fsy, ®sp and Rey) genetic 
distances. Moreover, the German surname-related Y chromosomes 
were comparably distant from Bavaria and Mecklenburg as the ones 
associated with the Slavic surnames (Supplementary Figure $2). MDS 
of pairwise genetic distances showed a clear-cut differentiation 
between German and Slavic samples (Figure 2). In addition, the 
MDS analysis revealed the pre-WWII populations from northern, 
central and southern Poland to be moderately scattered in the plot, on 
the contrary to modern Polish regional samples, which formed a very 
tight, homogeneous cluster (Figure 3). 
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The MPD and WIMP values did not reveal significant reduction in 
Y-chromosomal diversity in populations with differential degree of 
cultural and/or geographic isolation, that is, Kaszuby, Lusatia and 
Kurpie (Table 1). In order to check for the effect of sampling pre- 
WWII populations on STR variation, genetic variance (Vp) and 
average squared difference (ASDo) were assessed within the most 
common haplogroups found in the studied Slavic populations: 
R-M17*(xM458) and R-M458. Both parameters reached lower values 
in the native pre-WWII populations of the Vistula and Oder basins in 
comparison with the modern Polish population studied by Underhill 
et al.*° A value comparable to the modern Poles was obtained only in 
the case of ASDp in the R-M17*(xM458) chromosomes from Kaszuby 
(Table 3). A median-joining network of our R-M17*(xM458) 17-STR 
haplotypes revealed a clearly separated cluster of Y chromosomes, 
involving as many as 22 individuals from Kaszuby, as well as several 
individuals from other Slavic populations (Supplementary Figure S3). 
The observed cluster is likely to represent an unknown R-M17 
subclade and explains the high ASDp value in haplogroup 
R-M17*(xM458) among the Kashubes. 

BATWING of the Slavic populations of Kaszuby and Lusatia 
provided convergent MCMC chains with unimodal distribution and 
revealed that their divergence took place 1.7 kya (95% confidence 
intervals: 1.4—2.1 kya) and was preceded by 0.6ky of demographic 
expansion with a 4.2% growth rate (Table 4). 

As both the Sorbs and Kashubes are historically the most closely 
related to the extinct Slavic tribes of eastern Germany and none 
directly contributed to the modern German population of Mecklen- 
burg, it was assumed that the population of Mecklenburg resulted 
from admixture of western German (Bavarian as a proxy), Sorbian 
and Kashubian populations. All the ancestry estimates were the 
highest for the western German population (Supplementary Table 
$3). On the other hand, admixture analysis failed to detect consider- 
able German ancestry in paternal lineages of genetic outliers detected 
in the MDS analysis, that is, the Sorbs and Kashubes (Supplementary 
Table S4). After inclusion of data from German regional populations 
studied by Kayser et al,> the Slavic (Sorbian or Kashubian) ancestry 
estimates mp, My and my for the pooled eastern German populations 
(n=678) in comparison with the pooled western German 
populations (n= 886) ranged from 0.182 to 0.261. 


DISCUSSION 

Most molecular anthropological studies concerning early human 
history in Central Europe’?! exploit previously observed 
geographic homogeneity of Polish paternal lineages.* Although it 
was suggested that the homogeneous Polish Y-chromosomal gene 
pool was formed very recently after the massive human resettlements 
linked to the WWII,? a previous study on a southern Polish 
population failed to detect genetic differences between pre-WWII 
and post-WWII Y chromosomes in the region.!® However, it should 
be noted that the studied region did not experience massive 
population exchange and its post-WWII settlers originated mainly 
in the neighbouring areas.'© The same authors studied a modern 
population of Kaszuby, the most linguistically distinct ethnic group 
among modern Poles, and no genetic differentiation within the Polish 
population was found.'® Our results are based on pre-WWII regional 
populations from four out of five main Polish linguistic/dialectal 
groups (Kashubian, Masovian, Greater Polish and Lesser Polish), and 
demonstrate for the first time that the Polish paternal lineages were 
unevenly distributed within the country before the forced 
resettlements of millions of people during and shortly after the 
WWII. Small but statistically significant differentiation between the 
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Figure 1 Phylogenetic relationship and frequencies of Y-chromosomal haplogroups in the studied populations. Ka Kaszuby; Ko Kociewie; Ku Kurpie; 
Lu Lusatia; S/ Slovakia; Me Mecklenburg; Ba Bavaria. (1) R-M17-derived samples with unknown M458 status owing to permanent lack of PCR product, 
which the most likely resulted from deletion of the M458 locus, located in very close proximity to the DYS448 marker (independent deletions of DYS448 
have been described within different haplogroups*® and two out of the three samples with unknown R-M458 genotypes possess DYS448 null alleles). 


pre-WWII and modern populations is particularly remarkable taking 
into account the fact that modern Polish regional samples comprise 
varying ratios of pre-WWII inhabitants and post-WWII settlers. 
The observed heterogeneity suggests that precautions should be 
taken in order to collect representative population samples from 
Poland for evolutionary studies, as well as for forensic purposes in 
case of statistical evaluation of genetic evidence concerning regions 
densely populated by native pre-WWII inhabitants. 

Alternatively, the observed substructure could result from the fact 
that our pre-WWII samples originated in rural areas that were less 
likely to be influenced by migrations than large cities,** whereas 
Ploski et al? revealed geographic homogeneity of Y-chromosomal 
lineages in general populations of several Polish regions. However, it 
should be noted that WWlII-mediated resettlements involved both 
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urban and rural populations. The study by Wozniak et al!® on the 
modern population of Kaszuby from villages and small towns did not 
detect its distinctiveness from other modern Polish regional samples, 
which may be owing to the fact that in 1950, the post-WWII settlers 
constituted as many as 36.7% of inhabitants of an area roughly 
corresponding to the regions of Kaszuby and Kociewie? (in case of 
populations studied by Ploski et al,” in 1950, the share of post-WWII 
settlers ranged from 6.8% in the Cracow region up to 93.8% in the 
Wroclaw region*) and discards rural origin of our pre-WWII Polish 
regional populations as the main reason for the detected substructure. 

Parameters measuring STR variation within Y-chromosomal hap- 
logroups are commonly used for dating of SNP mutations in order to 
draw conclusions about origins and history of human popula- 
tions.?>?4 Underhill et al*? observed the highest genetic diversity in 
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Table 2 AMOVA results for the studied populations (Hg = 39 Y-SNP subclades; Ht17 = 17 Y-STRs) and for previously published data for 
Polish pre-war and modern populations (Ht7 = 7 Y-STRs) (Roewer et al;!7 Wozniak et a/'®18) 


Tested structure Markers Statistics P-value Percentage of variation 
2 groups: 5 Slavic populations vs 2 German populations (this study) Hg Fot=0.05715 0.04812 5.72 
Fsc = 0.03344 0.00000 3.15 
Fst = 0.08868 0.00000 91.13 
Ht17, ®st Pct = 0.06669 0.05059 6.67 
Psc = 0.00902 0.00000 0.84 
®s,7= 0.07510 0.00000 92.49 
Ht17, Rst Rot = 0.10529 0.04861 10.53 
Rsc = 0.00940 0.00000 0.84 
Rst = 0.11370 0.00000 88.63 
1 group: 3 Polish pre-war populations (this study) Hg Fs7=0.01356 0.00109 1.36 
Ht17, ®t Ps1 = 0.00246 0.06693 0.25 
Ht17, Rst Rst = 0.00749 0.01198 0.75 
2 groups: Polish pre-war? vs Polish modern? populations Ht7, ®st ®ctp= 0.00157 0.01287 0.16 
®sc = 0.00158 0.03426 0.16 
@st = 0.00314 0.00376 99.69 
Ht7, Rst Rot = 0.00201 0.03228 0.20 
Rsc = 0.00153 0.12337 0.15 
Rst = 0.00354 0.04614 99.65 
1 group: Polish pre-war* populations Ht7, ®st ®st = 0.00460 0.01713 0.46 
Ht7, Rst Rst = 0.00688 0.03475 0.69 
1 group: Polish modern® populations Ht7, ®st ®st = 0.00047 0.26792 0.05 
Ht7, Rst Rst = —0.00042 0.56960 —0.04 


Abbreviation: AMOVA, analysis of molecular variance. 


@Polish pre-war populations: Kaszuby (north, centre and south), Kociewie, Kurpie (this study), Podhale and Sadecczyzna (Woéniak et a/'®). 
Polish modern populations: Kaszuby (Wozniak et a/8), Podhale and Sadecczyzna (Woéniak et a/!®), Gdansk, Bydgoszcz, Warsaw, Lublin, Cracow, Wroclaw (Roewer et al’). 


Europe for R-M17*(xM458) and R-M458 subclades in the Vistula and 
Oder basins, which correspond roughly to the present-day territory of 
Poland. We examined Y-STR variation within the two subclades 
in pre-WWII Polish regional populations of the Vistula basin 
(Kurpie, Kociewie and Kaszuby) and in a native population of the 
Oder-Elbe basin borderland (Lusatia), and revealed a similarly high 
ASDp value as in the modern Polish population only for 
R-M17*(xM458) in Kaszuby, which we explained by the presence of 
an unknown subclade detected in the median-joining network. Apart 
from R-M17*(xM458) in Kaszuby, genetic diversity for both R-M17 
subclades was lower (in several cases much lower) in the native pre- 
WWII populations than in the modern one. This may be owing to the 
extensive mixing of the Polish population after the post-WWII 
massive resettlements, with millions of modern Poles tracing their 
pre-WWII origin to the Dniester, Dnieper and Neman basins in 
present-day Ukraine, Belarus and Lithuania. 

Kayser et al revealed significant genetic differentiation between 
paternal lineages of neighbouring Poland and Germany, which 
follows a present-day political border and was attributed to massive 
population movements during and shortly after the WWIL Although 
the very recent origin of the geographic course of the detected genetic 
boundary is undoubted, it remained unknown whether 
Y-chromosomal diversity in ethnically/linguistically defined Slavic 
and German populations, which used to be exposed to intensive 
interethnic contacts and cohabit ethnically mixed territories, was 
clinal or discontinuous already before the war. In contrast to the 
regions of Kaszuby and Kociewie, which were politically subordinated 
to German states for more than three centuries and before the massive 
human resettlements in the mid-20th century occupied a narrow strip 
of land between German-speaking territories, the Kurpie region 
practically never experienced longer periods of German _ political 
influence and direct neighbourhood with the German populations. 


Lusatia was conquered by Germans in the 10th century and since then 
was a part of German states for most of its history; the modern 
Lusatians (Sorbs) inhabit a Slavic-speaking island in southeastern 
Germany. In spite of the fact that these four regions differed 
significantly in exposure to gene flow with the German population, 
our results revealed their similar genetic differentiation from Bavaria 
and Mecklenburg. Moreover, admixture estimates showed hardly 
detectable German paternal ancestry in Slavs neighbouring German 
populations for centuries, that is, the Sorbs and Kashubes. However, it 
should be noted that our regional population samples comprised only 
individuals of Polish and Sorbian ethnicity and did not involve a pre- 
WWII German minority of Kaszuby and Kociewie, which owing to 
forced resettlements in the mid-20th century ceased to exist, and also 
did not involve Germans constituting since the 19th century a 
majority ethnic group of Lusatia. Thus, our results concern 
ethnically/linguistically rather than geographically defined 
populations and clearly contrast the broad-scale pattern of 
Y-chromosomal diversity in Europe, which was shown to be 
strongly driven by geographic proximity rather than by language.! 
They are also consistent with a previous study on autosomal markers, 
which provided evidence for clear genetic departure of the Sorbs from 
the neighbouring Germans and their genetic similarity to the Slavic- 
speaking Poles and Czechs.*? Although data for German-speaking 
populations that used to live in the neighbourhood of the Slavs of 
Kaszuby and Kociewie are not available, data from the Sorbs and 
neighbouring Germans could be used as a proxy, and our AMOVA 
results and ancestry estimates suggest that a genetic barrier between 
Slavic and German speakers similar to the one detected by Kayser 
et aP between modern Poland and Germany might have existed 
already before the war. 

Immel et al* revealed German and Slavic surname-associated 
strata in the Halle region in southeastern Germany, which was 


European Journal of Human Genetics 


Paternal genetic landscape of Poland and Germany 
K Rebata et al 


420 


Bavaria 
* 


Mecklenburg 


Slovakia 


Lusatia 
* 


KociewieKurpie 


Kaszu by 


-1.0 -0.5 


0.0 0.5 1.0 1.5 


Kaszuby 
0.5 i 


Kociewie 
3K 
Kurpie 
* 


0.0 
Slovakia 
* 


Lusatia 
* 


7 


-1.0 7 


Bavaria 
* 


Mecklenburg 
* 


-1.5 -1.0 -0.5 0.0 


0.5 1.0 1.5 2.0 


Figure 2 MDS analysis of (a) Fst values for Y-chromosomal haplogroups and (b) ®s; values for 17-locus Y-STR haplotypes observed in the studied 


populations. 


explained by the 19th century migration from the Polish-speaking 
territories. As German surnames are frequently encountered among 
the modern Poles, we have searched for such differentiation within 
the Polish pre-WWII regional populations. Both Slavic and German 
surname carriers revealed regional Y chromosome homogeneity and 
comparable genetic distances from the German populations, which 
suggests that etymologically German surnames in the studied 
populations may result, at least partially, from foreign 
administration and linguistic adaptation (eg, translation, common 
until the end of the 19th century and attested also in the 20th 
century), well documented in historical sources,2°2” rather than 
owing to genetic admixture. 

Two main factors are believed to be responsible for the Slavic language 
extinction in vast territories to the east of the Elbe and Saale rivers: 
colonisation of the region by the German-speaking settlers, known in 
historical sources as Ostsiedlung, and assimilation of the local Slavic 
populations, but contribution of both factors to the formation of a 
modern eastern German population used to remain highly speculative. 
Previous studies on Y-chromosomal diversity in Germany by Roewer 
et al!” and Kayser et al’ revealed east-west regional differentiation within 
the country with eastern German populations clustering between western 
German and Slavic populations but clearly separated from the latter, 
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which suggested only minor Slavic paternal contribution to the modern 
eastern Germans. Our ancestry estimates for the Mecklenburg region 
(Supplementary Table $3) and for the pooled eastern German popula- 
tions, assessed as being well below 50%, definitely confirm the German 
colonisation with replacement of autochthonous populations as the main 
reason for extinction of local Slavic vernaculars. The presented results 
suggest that early medieval Slavic westward migrations and late medieval 
and subsequent German eastward migrations, which outnumbered and 
largely replaced previous populations, as well as very limited male genetic 
admixture to the neighbouring Slavs (Supplementary Table S4), were 
likely responsible for the pre-WWII genetic differentiation between 
Slavic- and German-speaking populations. Wozniak et al'® compared 
several Slavic populations and did not detect such a sharp genetic 
boundary in case of Czech and Slovak males with genetically 
intermediate position between other Slavic and German populations, 
which was explained by early medieval interactions between Slavic and 
Germanic tribes on the southern side of the Carpathians. Anyway, 
paternal lineages from our Slovak population sample were genetically 
much closer to their Slavic than German counterparts. 
Coalescence-based analysis of populations sharing common 
ancestry, which experienced subsequent cross-migration, leads to 
underestimation of their divergence time. On the other hand, 
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Figure 3 MDS analysis based on ®sy distances for 7-locus Y-STR haplotypes observed in the studied populations compared with data published for 12 
Slavic and Germanic populations.!©17 Filled circles indicate modern populations from northern (Gda Gdansk), central (War Warsaw) and southern Poland 
(Cra Cracow). Empty circles indicate pre-WWII populations from northern (KaN, KaC, KaS northern, central, southern Kaszuby; Ko Kociewie), central 
(Ku Kurpie) and southern Poland (PoS). Other Slavic populations: Lu Lusatia; S/ western Slovakia. German populations: Me Mecklenburg; Ba western 
Bavaria; Gre Greifswald; Ber Berlin; Le/ Leipzig; Mai Mainz; Muin Minster. Other Germanic populations: Den Denmark; Got Gotland (Sweden); Ble Blekinge 


(Sweden). 


Table 3 Vp and ASDo for 17 Y-STRs in haplogroups R-M17*(xM458) 
and R-M458 in native pre-war regional populations of the Vistula and 
Oder basins (this study) and in the modern Polish population, studied 
by Underhill et af° 


R-M17*(xM458) R-M458 
Population n Vp ASDo n Vp ASDo 
Kaszuby 82 0.327 0.454 44 0.160 0.170 
Kociewie 45 0.334 0.413 44 0.203 0.233 
Kurpie 42 0.324 0.369 55 0.173 0.206 
Lusatia 10 0.168 0.206 70 0.176 0.209 
Poland (modern) 21 0.424 0.462 29 0.223 0.262 


Abbreviations: ASDo, average squared difference; Vp, genetic variance. 


Table 4 Times of demographic expansion and split for Y 
chromosomes from the populations of Kaszuby and Lusatia 


Parameter Modal value with 95% Cl 


Time to the most recent common ancestor 20.0 kya (16.2-29.3) 
2.4kya (1.8-3.2) 

4.2% (3.1-6.4%) 
1.7 kya (1.4-2.1) 


0.6 ky (0.1-1.4) 


Expansion start 

Growth rate 

Split 

Time between the expansion start and the split 


Abbreviation: Cl, confidence interval. 


coalescence-based analysis of populations sharing common ancestry, 
which experienced subsequent gene flow with unrelated populations, 
is likely to overestimate their divergence time and affect other 
demographic parameters. As the model implemented in BATWING 
does not assume migration between diverged populations, our analysis 
was performed on populations of Kaszuby and Lusatia, which owing 


to geographic remoteness and a linguistic barrier remained isolated 
from each other and from their German-speaking neighbours. Our 
coalescence-based divergence time estimates for the two isolated 
western Slavic populations almost perfectly match historical and 
archaeological data on the Slavs’ expansion in Europe in the 5th—6th 
centuries.* Several hundred years of demographic expansion before the 
divergence, as detected by the BATWING, support hypothesis that the 
early medieval Slavic expansion in Europe was a demographic event 
rather than solely a linguistic spread of the Slavic language. 
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